Determining Asynchronous Pipeline Execution

نویسنده

  • Jeanne Ferrante
چکیده

Asynchronous pipelining is a form of parallelism in which processors execute diierent loop tasks (loop statements) as opposed to diierent loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the same processor. This variant of pipelining is particularly relevant in distributed memory systems (since pipeline control may be distributed across processors), but may also be used in shared memory systems. Accurate estimation of the execution time of a pipeline schedule is needed to determine if pipelining is appropriate for a loop, and to compare alternative schedules. Pipeline execution of n iterations of a loop requires time at most a + bn, for some constants a and b. The coeecient b is the iteration interval of the pipeline schedule, and is the primary measure of the performance of a schedule. The startup time a is a secondary performance measure. We generalize previous work on determining if a pipeline schedule will deadlock, and generalize Reiter's well-known formula 21] for determining the iteration interval b of a deadlock-free schedule, to account for nonzero communication times (easy) and the assignment of multiple tasks to processors (nontrivial). Two key components of our generalization are the use of pipeline scheduling edges, and the notion of negative data dependence distances (in a single unnested loop). We also discuss implementation of an asynchronous pipeline schedule at runtime; show how to eeciently simulate pipeline execution on a sequential processor; derive bounds on the startup time a; and discuss evaluation of the iteration interval formula, including development of a new algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing Asynchronous Pipeline

Asynchronous pipelining is a form of parallelism which may be used in distributed memory systems. An asynchronous pipeline schedule is a generalization of a noniterative DAG schedule. Accurate estimation of the execution time of a pipeline schedule is needed to determine if pipelining is appropriate for a loop, and to compare alternative schedules. Pipeline execution of n iterations of a loop r...

متن کامل

Determining Asynchronous Acyclic Pipeline Execution Times

Pipeline execution is a form of parallelism in which subcomputations of a repeated computation, such as statements in the body of a loop, are executed in parallel. A measure of the execution time of a pipeline is needed to determine if pipelining is an effective form of parallelism for a loop, and to evaluate alternative scheduling choices. We derive a formula for precisely determining the asyn...

متن کامل

ARAS: Asynchronous RISC Architecture Simulator1

In this paper, an asynchronous pipeline instruction simulator, ARAS is presented. With this sim-ulator, one can design selected instruction pipelines and check their performance. Performance measurements of the pipeline connguration are obtained by simulating the execution of benchmark programs on the machine architectures developed. Depending on the simulation results obtained by using ARAS, t...

متن کامل

ARAS: asynchronous RISC architecture simulator

In this paper, an asynchronous pipeline instruction simulator, ARAS is presented. With this simulator, one can design selected instruction pipelines and check their performance. Performance measurements of the pipeline configuration are obtained by simulating the execution of benchmark programs on the machine architectures developed. Depending on the simulation results obtained by using ARAS, t...

متن کامل

Survey of the Counterflow Pipeline Processor Architectures

11. T H E ORIGINAL CFPP Abstract The Counterflow Pipeline Processor (CFPP) Architecture is a RISC-based pipeline processor [ l I. I t was proposed in 1994 as asynchronous processor architecture. Recently, researches have implemented it as synchronous processor architecture and later improved its design in terms of speed and performance by reducing average execution latency of instructions and m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996